18 research outputs found
Fact-based Text Editing
We propose a novel text editing task, referred to as \textit{fact-based text
editing}, in which the goal is to revise a given document to better describe
the facts in a knowledge base (e.g., several triples). The task is important in
practice because reflecting the truth is a common requirement in text editing.
First, we propose a method for automatically generating a dataset for research
on fact-based text editing, where each instance consists of a draft text, a
revised text, and several facts represented in triples. We apply the method
into two public table-to-text datasets, obtaining two new datasets consisting
of 233k and 37k instances, respectively. Next, we propose a new neural network
architecture for fact-based text editing, called \textsc{FactEditor}, which
edits a draft text by referring to given facts using a buffer, a stream, and a
memory. A straightforward approach to address the problem would be to employ an
encoder-decoder model. Our experimental results on the two datasets show that
\textsc{FactEditor} outperforms the encoder-decoder approach in terms of
fidelity and fluency. The results also show that \textsc{FactEditor} conducts
inference faster than the encoder-decoder approach.Comment: ACL 202
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Text editing is a crucial task that involves modifying text to better align
with user intents. However, existing text editing benchmark datasets have
limitations in providing only coarse-grained instructions. Consequently,
although the edited output may seem reasonable, it often deviates from the
intended changes outlined in the gold reference, resulting in low evaluation
scores. To comprehensively investigate the text editing capabilities of large
language models, this paper introduces XATU, the first benchmark specifically
designed for fine-grained instruction-based explainable text editing. XATU
covers a wide range of topics and text types, incorporating lexical, syntactic,
semantic, and knowledge-intensive edits. To enhance interpretability, we
leverage high-quality data sources and human annotation, resulting in a
benchmark that includes fine-grained instructions and gold-standard edit
explanations. By evaluating existing open and closed large language models
against our benchmark, we demonstrate the effectiveness of instruction tuning
and the impact of underlying architecture across various editing tasks.
Furthermore, extensive experimentation reveals the significant role of
explanations in fine-tuning language models for text editing tasks. The
benchmark will be open-sourced to support reproduction and facilitate future
research.Comment: Work in progres
Less is More for Long Document Summary Evaluation by LLMs
Large Language Models (LLMs) have shown promising performance in summary
evaluation tasks, yet they face challenges such as high computational costs and
the Lost-in-the-Middle problem where important information in the middle of
long documents is often overlooked. To address these issues, this paper
introduces a novel approach, Extract-then-Evaluate, which involves extracting
key sentences from a long source document and then evaluating the summary by
prompting LLMs. The results reveal that the proposed method not only
significantly reduces evaluation costs but also exhibits a higher correlation
with human evaluations. Furthermore, we provide practical recommendations for
optimal document length and sentence extraction methods, contributing to the
development of cost-effective yet more accurate methods for LLM-based text
generation evaluation.Comment: Work in progres
Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks
Numerous HR applications are centered around resumes and job descriptions.
While they can benefit from advancements in NLP, particularly large language
models, their real-world adoption faces challenges due to absence of
comprehensive benchmarks for various HR tasks, and lack of smaller models with
competitive capabilities. In this paper, we aim to bridge this gap by
introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft
this benchmark to cater to a wide array of HR tasks, including matching and
explaining resumes to job descriptions, extracting skills and experiences from
resumes, and editing resumes. To create this benchmark, we propose to distill
domain-specific knowledge from a large language model (LLM). We rely on a
curated skill-occupation graph to ensure diversity and provide context for LLMs
generation. Our benchmark includes over 50 thousand triples of job
descriptions, matched resumes and unmatched resumes. Using RJDB, we train
multiple smaller student models. Our experiments reveal that the student models
achieve near/better performance than the teacher model (GPT-4), affirming the
effectiveness of the benchmark. Additionally, we explore the utility of RJDB on
out-of-distribution data for skill extraction and resume-job description
matching, in zero-shot and weak supervision manner. We release our datasets and
code to foster further research and industry applications
Zero-shot Triplet Extraction by Template Infilling
The task of triplet extraction aims to extract pairs of entities and their
corresponding relations from unstructured text. Most existing methods train an
extraction model on training data involving specific target relations, and are
incapable of extracting new relations that were not observed at training time.
Generalizing the model to unseen relations typically requires fine-tuning on
synthetic training data which is often noisy and unreliable. We show that by
reducing triplet extraction to a template infilling task over a pre-trained
language model (LM), we can equip the extraction model with zero-shot learning
capabilities and eliminate the need for additional training data. We propose a
novel framework, ZETT (ZEro-shot Triplet extraction by Template infilling),
that aligns the task objective to the pre-training objective of generative
transformers to generalize to unseen relations. Experiments on FewRel and
Wiki-ZSL datasets demonstrate that ZETT shows consistent and stable
performance, outperforming previous state-of-the-art methods, even when using
automatically generated templates. https://github.com/megagonlabs/zett/Comment: IJCNLP-AACL 2023 (main
Learning to Select, Track, and Generate for Data-to-Text
We propose a data-to-text generation model with two modules, one for tracking
and the other for text generation. Our tracking module selects and keeps track
of salient information and memorizes which record has been mentioned. Our
generation module generates a summary conditioned on the state of tracking
module. Our model is considered to simulate the human-like writing process that
gradually selects the information by determining the intermediate variables
while writing the summary. In addition, we also explore the effectiveness of
the writer information for generation. Experimental results show that our model
outperforms existing models in all evaluation metrics even without writer
information. Incorporating writer information further improves the performance,
contributing to content planning and surface realization.Comment: ACL 201